Pronunciation modeling for large vocabulary conversational speech recognition
نویسندگان
چکیده
In this paper, we address the issue of deriving and using more realistic pronunciations to represent words spoken in natural conversational speech. Previous approaches include using automatic phoneme-based rule-learning techniques [1, 2, 7], linguistic transformation rules [4, 8], and phonetically hand-labelled corpus [3] to expand the number of pronunciation variants per word. While rule-based approaches have the advantage of being easily extensible to infrequent or unobserved words, they su er from the problem of over generalization. Using hand-transcribed data, one can obtain a more concise set of new pronunciations but it cannot be extended to unobserved or infrequently occuring words. In this paper, we adopt the hand-labelled corpus scheme to improve pronunciations for frequent multi and single words occurring in the training data, while using the rule-based techniques to learn pronunciation variants and their weights for the infrequent words. Furthermore, we experiment with a new approach for speaker-dependent pronunciation modeling. The newly expanded dictionaries are evaluated on the Switchboard and Callhome corpora, giving a slight reduction in word recognition error rate.
منابع مشابه
Enhanced tree clustering with single pronunciation dictionary for conversational speech recognition
Modeling pronunciation variation is key for recognizing conversational speech. Rather than being limited to dictionary modeling, we argue that triphone clustering is an integral part of pronunciation modeling. We propose a new approach called enhanced tree clustering. This approach, in contrast to traditional decision tree based state tying, allows parameter sharing across phonemes. We show tha...
متن کاملPronunciation Modeling for Large Vocabulary Speech Recognition by Arthur
The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy for automatic speech recognition (ASR). Many pronunciation modeling approaches have been developed to address this problem. Some explicitly manipulate the pronunciation dictionary as well as the set of the units used to define the pronunciations of words. Others model the pronunciati...
متن کاملRate-of-speech Modeling for Large Vocabulary Conversational Speech Recognition
Variations in rate of speech (ROS) produce changes in both spectral features and word pronunciations that affect automatic speech recognition (ASR) systems. To deal with these ROS effects, we propose to use parallel, rate-specific, acoustic models: one for fast speech, the other for slow speech. Rate switching is permitted at word boundaries, to allow modeling within-sentence speech rate variat...
متن کاملRate-dependent Acoustic Modeling for Large Vocabulary Conversational Speech Recognition
Variations in rate of speech (ROS) produce changes in both spectral features and word pronunciations that affect automatic speech recognition (ASR) systems. To deal with these ROS effects, we propose to use parallel, rate-specific, acoustic models: one for fast speech, the other for slow speech. Rate switching is permitted at word boundaries, to allow modeling within-sentence speech rate variat...
متن کاملModeling of Pronunciation, Language and Nonverbal Units at Conversational Russian Speech Recognition
The main problems of a conversational Russian speech recognition system development are variability of pronunciation, free word-order in sentences and presence of speech disfluencies. In the paper, pronunciation variability is modeled by creation of multiple word transcriptions. A syntacticstatistical language model that takes into account long-distant word dependencies is proposed for Russian ...
متن کاملSpeaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition
In spontaneous conversational speech there is a large amount of variability due to accents, speaking styles and speaking rates (also known as the speaking mode) [3]. Because current recognition systems usually use only a relatively small number of pronunciation variants for the words in their dictionaries, the amount of variability that can be modeled is limited. Increasing the number of varian...
متن کامل